Semi-Parametric Regression Models and Connecticut’s Hospital Costs

نویسنده

  • Jeffrey P. Cohen
چکیده

The application of spatial analysis in assessing hospital costs has been largely ignored but is deserving of attention. Proximity to other hospitals can lead to spatial spillovers, and recognizing spatial effects can impact hospital economies of scale estimates. In this paper we estimate a variety of cost function models, using annual data for each of Connecticut’s 30 hospitals over a 10 year time period, and allow for spatial effects. We consider a variety of semi-parametric regression models as in McMillen and Redfearn (2010). One innovation is that we address both the space and time dimensions in the kernel weights of our panel data semi-parametric regression models. This approach also allows for a general functional form. We find that including a life expectancy measure for years above average lifespan has a negative and significant effect on hospital costs. Finally, we also address potential endogeneity of the life expectancy variable through an instrumental variables estimation approach for panel data semi-parametric models, as first suggested more generally by Baltagi and Li (2002). Monte Carlo simulations indicate our estimator performs well. When addressing the endogeneity with this instrumental variables semi-parametric regression model, the elasticities of scale estimates are smaller but still significant. We also find the hospital cost savings for each year of patients’ years above average life expectancy is approximately $4,700 to $7,300 on average, depending on the choice of bandwidth. This life expectancy cost reduction ranges from as low as approximately $480 to as high as $35,000, varying by individual hospitals and by year. Introduction Hospitals in the U.S. are situated in both urban and rural areas. Often, there are clusters of hospitals in urban areas but the rural hospitals are spread apart. The spatial nature of hospital locations can affect economies of scale estimates for hospitals. An understanding of economies of scale for hospitals in the U.S. is important because with federal health care reform, greater numbers of Medicaid and uninsured individuals are expected to seek medical treatment. So any information for policy makers on which hospitals are operating most efficiently can be helpful in efforts to decide which hospitals to direct federal and state-level funding. Since focusing exclusively on client counts rather than on the success rates of hospital treatments can be misguided, addressing how improving the “outcomes” of hospital treatment impacts hospital costs is also crucial. Our approach to addressing all of these issues in a hospital cost function model is estimation of semi-parametric regression models, including an instrumental variables specification for an endogenous outcomes variable. We also add to the semi-parametric regression literature with our focus on a weighting kernel that includes a space and a time component. Cohen et al (2010), Cohen and Morrison Paul (2008), and Li and Rosenman (2001) discuss the broader literature on hospital cost functions. Early studies, such as Carey and Stefos (1992), without relying on economic theory in their cost function estimations focus on a linear functional form that appends quadratic and cubic terms but no interaction terms. Some of the early studies find evidence of internal economies of scale, but follow-up papers, such as Li and Rosenman (2001), show the importance of allowing for non-linearities that are precluded by using the Cobb-Douglas functional form. Cohen and Morrison Paul (2008) model hospitals spatial effects through a shift variable in the cost function. They argue that hospitals in urban areas that are clustered together have better access to labor markets, which implies that it may be easier to recruit skilled workers (such as nurses and/or physicians) who are already located nearby. Such an effect, described by O’Sullivan (2010) as labor market pooling, was found to be a significant determinant of costs among the 92 hospitals in the State of Washington that Cohen and Morrison Paul examined. Bates and Santerre (2005) focus on a production function for metropolitan statistical areas in the U.S., and find significant evidence of agglomeration among the hospitals in these areas. Both of these studies modeled the spatial phenomenon directly in the cost or production functions. But imposing such structure on the cost or production function may be considered arbitrary, and for this reason we explore how allowing more general consideration of spatial effects may impact costs. Little attention has been given to the estimation of spatial econometric models or semi-parametric approaches for hospitals, despite the close link between agglomeration and the spatial locations of hospitals. Exceptions include Mobley et al (2009), although they examine competition among hospitals and also allow for an exogenous spatial shift variable; and Moscone and Knapp (2005), who use a spatial econometrics model to examine mental health expenditures in the UK, although their analysis is at the metropolitan statistical area rather than the provider level. An alternative approach is to use a nonparametric or semi-parametric technique, which allows greater flexibility without imposing a restrictive functional form as is done with parametric techniques. Wilson and Carey (2005) follow a non-parametric approach. Their rationale is that a priori functional form assumptions made by other researchers (such as Cobb-Douglas, Generalized Leontief, or translog) are arbitrary and unnecessarily impose restrictions on the estimation. Approach The analysis we perform is on 30 hospitals over a 10 year period from 1999-2008 in the state of Connecticut, a small prosperous state in the northeastern U.S. Figure 1 shows the locations of the 30 hospitals as well as the population densities. It can be seen that in the larger cities such as Hartford and New Haven, there are clusters of several hospitals, while in the more rural areas of the state the concentration of hospitals is relatively sparse. This variation in hospital locations motivates our analysis of spatial aspects of hospital costs in Connecticut. We focus on analyzing a hospital cost function, using several approaches to control for spatial heterogeneity with the semi-parametric model, so a more nonlinear specification would be redundant. In a short-run cost model, the capital stock is assumed fixed, and thus variable costs depend on a vector of input prices, p (here, wages and the price of other non-capital inputs); and a vector of shift variables (R). Here, these shift factors include the fixed factor, capital (K); a Case-Mix variable, CMI (or “CASE_MIX”); percent of total patient days that are Medicare patient days, MEDICARE_DAYS; percent of total patient days that are Medicaid patient days, MEDICAID_DAYS; number of total inpatients, YINPAT ; and number of total outpatients, YOUTPAT. Other shift variables we include in some of the models (in the vector R) are a time trend (YEAR), and one or more life expectancy variables, such as the years above average lifespan overall (LIFE), or several different specific causes of death. Since we are estimating a short-run model (where the capital stock is a fixed factor), C represents variable costs. The CobbDouglas variable cost function (without the life expectancies) takes the form of: C=c(p, R)=(P1)(P2)(K)(CMI)(MCARE)(MCAID)(YINPAT)(YOUTPAT)exp (1) At this point, we assume that u is iid with mean zero and constant variance. Such a model can be transformed by taking the natural log of both sides, yielding an estimation equation of: LogCi,t = 1 log(P1,i,t) + 2 log(P2,i,t) + 3 log(K i,t) + 4 log(CMI i,t) + 5 log(MCARE i,t) + 6 log(MCAID i,t) + 7 log(YINPAT i,t) + 8 log(YOUTPAT i,t) + u i,t . (2) We begin by estimating this variable cost function model in (2) by OLS, assuming ui,t is iid with mean zero and constant variance, and zero covariances among and across i and t observations. Since we have data on 30 hospitals (i=1,2,...,30) for each of 10 years (t=1,2,...,10), an alternative, more general approach is a semi-parametric regression model, as in McMillen and Redfearn (2010). We control for spatial 1 Using a capital stock measure, opposed to a capital price, is appropriate for a short-run cost function model, as has been done in the literature, including by Cohen and Morrison Paul (2011). effects using this approach, and we also address the issue of internal economies of scale. An additional contribution is that we allow for life expectancy variables to enter the cost function, some of which may be endogenous. Also, the semiparametric approach is a way for us to control for non-linearities in the functional form. While other hospital studies, such as Cohen and Morrison Paul (2008) and Li and Rosenman (2001) have used a Generalized Leontief functional form, those researchers did not estimate a spatial panel data model and such a nonlinear functional form would be relatively difficult to implement in a panel data context using parametric panel data spatial econometrics techniques. While there are few known semi-parametric or nonparametric studies of hospital costs (Wilson and Carey is an exception), there are no known panel data studies on this application that address both the spatial and time dimensions in the kernel weights. Specifically, McMillen and Redfearn (2010) estimate a semi-parametric regression model to control for spatial effects. Their model is of the form: Yi = f(Zi ) + Xi + ui , (3) where f(Zi ) represents the non-parametric variables, X is the single parametric variable, and u is an iid error term. The focus of a semi-parametric regression model is the estimation of  by controlling for the other variables in a nonparametric manner. To estimate the coefficients in f(Zi), we use the Geographically Weighted Regressions (GWR) non-parametric estimator, which can be estimated using weighted least squares. The advantage of using a semi-parametric model over a fully nonparametric one is for convenience in interpretation and the faster converging rate, the latter being particularly important given our sample size. The estimate of  provides an estimate of the conditional expectation of Yit given Xit after controlling in a general, nonparametric way for the effects of all other variables. We first consider the case where both X and Z are exogenous. Following Robinson (1988), by taking expectation of (3) conditional on variables in the nonparametric component, zit, then subtracting it from (3) we have Yit E(YitZit) = [Xit E(XitZit)] ’ + uit (4) If we use the following notations: it = Yit E(Yit Zit) , Vit = Xit E(Xit Zit) , (5) then we can write the above equation (4) as it = Vit  + uit (6) Then a simple OLS regression of  on V will give a consistent estimator for , assuming E(YitZit) and E(Xit Zit) are known. In practice, these conditional expectations can be approximated using locally weighted regression (LWR) following McMillen and Redfearn (2010). By rotating each independent variable in the parametric part of the model, X, and leaving the rest of the independent variables in the nonparametric component of the model, f(Z), we can get an 2 McMillen and Redfearn (2010) note that LWR is equivalent to Geographically Weighted Regressions (GWR). Accordingly, we refer to LWR and GWR interchangeably in our discussion. estimate of the marginal impact of each individual factor on hospital costs after controlling for the effects of all other variables in a nonparametric way. We use Geographically weighted regression (GWR) in approximating the conditional expectation in the above discussion. More specifically, the variable E(XitZi0t0) is calculated by minimizing the following objective function with respect to a and b, it (yit – a – b’X it)K(d it /h1) K(it/h2) i=1,2,...,N, t=1,2,...,T. (7), where K(•) is a kernel function that determines the weight that observation (i, t) receives in the regression; dit and it are the distance between observation (i; t) and (i0; t0) in space and in time, respectively, and h1, h2 are the bandwidth on space and time, respectively. This approach is appealing because it leverages the panel nature of the data in a manner that implies hospital i in year t is a different observation than hospital i in year t-r (where r=1,2,...,9, t=1,2,...,10), since we add the additional dimension of time. This distance between observations in the time dimension is reflected in the kernel that depends on it. The Gaussian kernel function is used to calculate the weight assigned to each observation, based on its distance from the target point, both in geographic location and time/year. As McMillen and Redfearn (2010) note, it is well known 3 The semi-parametric regression approach follows McMillen and Redfearn (2010). For each parametric variable z, first we use GWR to regress y on X, and z on X, calculate the fitted residuals uy and uz , then regress the fitted uy on uz using OLS. The parameter estimate for  is the coefficient on uz in this second stage. We repeat this process allowing each of our explanatory variables to be the sole parametric variable, z, in order to obtain semi-parametric regression estimators for the coefficient and standard errors of z, after controlling for any nonlinearities in the model. 4 The distances dit and it are normalized with the standard deviation of dit and it over all i and t. 5 The kernel function on time assigns positive weight only for it  0 and assigns 0 weight for it >0, i.e., only those observations that precede observation (i0,t0) in time are given positive weights. that the choices of kernel functions tend to have little effect on the results. The performance of kernel estimation is much more sensitive to the choice of bandwidth, h. Given that in the dataset the hospitals are located densely in some areas and sparsely in other areas, a fixed bandwidth would lead to oversmoothing in areas where many observations are present and under-smoothing in areas with sparse data. Following McMillen and Redfearn (2010) we use a “K nearest neighbor (K-nn)” approach in calculating the bandwidth. For a target point we chose a bandwidth to include a fixed percentage of the sample into the local averaging. In addition to the variables included in the cost function models above, we explore the impacts of including an additional set of “output” variables in the R vector in (1), based on life expectancies. Carey and Burgess (1999) included outcomes in the cost function for hospitals, and in our context we include life expectancy as an outcome. The life expectancy data are described below in the data section. In other words, it is possible for hospitals that spend more to end up with better outcomes, while hospitals that treat patients who tend to be healthier may have different operating costs. One way in which patients may be healthier is by participating in wellness programs. While the concept of wellness programs promoting lower medical costs is appealing, there are few studies that report on 6 McMillen and Redfearn (2010) use two “window” sizes (25% and 100%) and a tri-cube kernel function, and we follow their approach of using these two bandwidths, since there is a lack of guidance in the literature on bandwidth selection in the particular semi-parametric models that we analyze (Ichimura and Todd, 2006). With a Gaussian kernel function (Standard normal density function), which we use in our analysis, the bandwidth includes a specified percentage of the sample points within two standard deviations away from the target point. Sample points outside of the window (two standard deviations) are in the “tails” and essentially get near-zero weights and are ignored in the averaging. The two standard deviations in the Gaussian kernel is analogous to the support of [-1, 1] for the tri-cube kernel used by McMillen and Redfearn (2010). the actual benefits that accrue from such programs. Ahmed and Rak (2010) report that participation in a large wellness program provided by a healthcare company for diagnostic categories of musculoskeletal and digestive resulted in readmission rates that were lower among individuals who were engaged in a wellness program as compared with those who were not engaged. Individuals not engaged in the wellness program were almost four times more likely to have a hospital readmission than those who participated in the program. Shephard (1999) concluded that studies of work site wellness programs suggest a number of important results including a reduction in healthcare costs with yearly benefits estimated to be between $500 and $700 per worker per year. Naydeck, Pearson, and Ozminkowski (2008) found by using a multivariate model for data for the firm Highmark Inc. estimated that yearly overall health care expenses were on average $176 lower for participants in the program. Inpatient expenses for participants were lowered by $182. They conclude that their study suggests that a comprehensive health promotion program can lower the rate of health care cost increases. To address this potential endogeneity of the life expectancy variable, we follow an approach outlined by Baltagi and Li (2002) for instrumental variable estimation in semi-parametric panel data models. Specifically, Baltagi and Li (2002) describe the following type of semiparametric model: Yit = f(Zit) + Xit + uit , (7) where Xit is of dimension one,  is a unknown parameter that is of our main interest, Zit is of dimension dx1, and f(zit) is a smooth but otherwise unknown function. They also introduce a kernel function, Kit,js = K((Zit Zjs)/b), where b is the smoothing parameter. Also, in contrast to our semi-parametric regression model, the Baltagi and Li error term structure is assumed to be a one-way error component, which is the same as that used by Kapoor et al in their spatial econometrics panel data model. In contrast, we address the time dimension in the kernel weights. Baltagi and Li outline a feasible instrumental variables, generalized least squares (IVGLS) estimator for the endogenous variable Xit, which depends on the kernel, among other variables. This IVGLS estimator is subsequently used in obtaining a nonparametric estimator for f(Z). We follow the Baltagi and Li (2002) approach, but in the context of a GWR framework, and we also use the Gaussian kernel function. At this point we only allow the endogenous variable(s) to be Xit, i.e. the endogenous variable is in the parametric part of the model. Note that when Xit is endogenous, uit and Vit are correlated in (6) because of the dependence between Xit and uit. We need an instrumental variable that is correlated with Vit but independent from uit to get a consistent estimate of . In this particular case, the life expectancy of a hospital patient in year t likely depends on the hospital's expenses in the previous year, (t-1). A hospital's expenses in (t-1) in turn 7 It is noteworthy that in the Baltagi and Li (2002) semi-parametric estimation approach, this kernel function can include some of the explanatory variables (which is a regular kernel regression), while in the locally weighted regressions approach the distance between two observations is generally used as the kernel argument. 8 In the case that the nonparametric components are endogeneous, the asymptotic analysis is more complex and we are not aware of any kernel method that addresses the issue. depends on factors in year (t 1), such as wages, fixed costs, and others, i.e. all the exogenous variables included in Zt 1. Therefore, we use m  E(VitZi,t -1) as the instrumental variable. Finally the estimator for  can be obtained with IV-OLS as follows: ^ (V’mm’V)V’mm’(Y E(Yit Zit)) =  + (V’mm’V)V’mm’u (8) After obtaining an estimator for , the nonparametric component f(zit) can be estimated by a nonparametric regression of ^ (yit xit on zit , i=1,...,N, t=1,...,T. A locally weighted estimator of f(zit) can be calculated at each observation point. Since ^ converges at rate of n, which is faster than the usual nonparametric convergence rate, replacing  with its estimator has minimal impact in the estimation of f(zit). Monte Carlo Simulation In order to provide some confidence in the performance of the two-stage estimator for a panel-data model with both spatial effects and endogeneity, we developed a simple Monte Carlo experiment based on a stylized model from our 9 In the situation where we know Var(u) =  , Baltagi and Li (2002) show that a potentially more efficient estimator of  is the IV-GLS estimator. However, in their Monte Carlo simulation the IVGLS estimator performs worse than the IV-OLS estimator, so we focus on the IV-OLS estimator in our study. 10 It is possible to conduct F-tests on the significance of the nonparametric estimates, as in McMillen and Redfearn (2010), which we have done and reported in Tables 7a and 7b. As an alternative, we could report standard deviations of all N times T coefficient estimates, although these cannot be used for the purpose of statistical inference. data set. In this Monte Carlo simulation, we envision a hospital cost model where there are N hospitals and each hospital is observed over T time-periods. The hospital costs, Yit, i = 1, ...,N; t = 1,...,T, depends on its past, Yit 1, and an exogenous variable, Zit, as in the following data generating process (DGP): Yit = Yit – 1 + 1Zit + 2Zit + it , (9) where we set  = 0.5, 1 = 2 = 1. We also assume Yi0 =0. In the DGP in (9), the variable Zit is i.i.d. from a uniform distribution on [ -0.5, 0.5]. The error process, NT = [11,..., NT] ’, can be written in matrix form as the following: NT = (WN N)  eT + N  eT + NT (10) where  is a scalar parameter in the first order spatial autoregressive process; WN is a (N x N) arbitrary (known) weight matrix based on geographic distances (a separately generated variable) between observations; N is a (N x 1) vector of random variables following a Normal distribution with mean 0 and standard deviation of 1/3; eT is a (T x 1) vector of ones; and NT is a (NT x 1) vector of random variables. In order to examine the performance of the estimator in different scenarios, we allow the parameters in the model to take the following values: N = 30, 60; T = 10, 20;  = 0.25, 0.75; and also the correlation level among observations taken on the same subject (i.e., hospital), Corr = 0.4, 0.7. 11 W is constructed using the product of Gaussian kernels on distance in space and time. 12 This standard deviation for N is chosen to have the same scale with the other random variable in the model, Z, which is uniform [‐0.5, 0.5]. Note that in our actual data set, N=30 and T=10. For each of the above specifications we performed 500 repetitions (M=500). The focus in the simulation study is on the estimation of , which is estimated using equation (6). The conditional expected values used in (6), namely E(YitZit), E(Xit Zit) and E(Vit Zi;t 1), are calculated using GWR. We report in Table 2 the estimated bias, standard deviation (Std), and root mean square error (RMSE) for all model specifications. These quantities are calculated as: ^ ^ Bias( ) = Mi (i ), (11) ^ ^ ^ Std( ) = { Mi (i – mean( ))} (12) ^ ^ and Rmse( ) = { Mi (i )}, (13) where i=(1,...,M). Summarizing the simulation results, the estimator performs reasonably well in all DGP specifications. With the true value  = 0.5, the estimator gives an average bias of 0.013, average Std of 0.071 and average RMSE of 0.082 across all specifications, indicating that the estimator under study performs reasonably well in all DGP specifications. As we increase the number of units (N) all three performance measures decrease, which indicates that the estimator under study is likely to be consistent in large samples, as suggested Baltagi and Li (2002). In addition, the Std and RMSE also decreases with the number of time periods (T) on which each unit is observed. When the correlation level among observations on the same unit taken at different times increases from 0.4 to 0.7, the estimator tends to produce slightly larger Std and RMSE. As expected, the higher correlation has no significant impact on the estimator's bias. Similarly, when  (the first order spatial autoregressive spatial parameter) increases from 0.25 to 0.75, the standard deviation of the estimator increases, as expected. However, the bias and RMSE of the estimator shows a mixed pattern with higher spatial dependence. Figures 2a, 2b through 5a, 5b show the distributions of the b parameter in each of the simulations, for various combinations of assumptions on the time series and spatial autocorrelation parameters. In each set of diagrams, the panel (a) represents the case with N=30, T=10, while the second, panel (b) represents the case with N=60, T=20. By comparing each pair of plots in panels (a) and (b), it is evident that when the sample size in the simulations increases (in both the time and spatial dimensions), the distributions for b become more concentrated around 0.5, which is the true value to be estimated. This implies that our semiparametric estimator for the endogenous variable is likely to be consistent in large samples. Data The annual data covering the years 1999-2008 on the 30 individual Connecticut hospitals which were also used in Cohen, Gerrish, and Galvin (2010) was obtained from the State of Connecticut Department of Public Health. Descriptive statistics are presented in Table 1a. Labor price (consisting of total wages and benefits) and the price of other expenses (excluding labor and depreciation expenses) are each normalized to a base year (1999). The average hospital’s property, plant, and equipment value was about $82,000,000 (in 1999 dollars), with a range of $71,000,000 to $256,000,000. The average number of inpatient days was 66,000 and outpatient visits averaged 211,000. The mean of variable costs (expenses) was $157,000,000 (in 1999 dollars). The life expectancy data also were obtained from the State of Connecticut Department of Public Health. The descriptive statistics for years above average life expectancies at birth for all causes of death (LIFE), as well as for several specific causes (cardio, cancer, diabetes, stroke, trauma), were calculated and are listed in Table 1b. These annual life expectancy data for each of the 30 Connecticut hospitals also cover the years 1999 to 2008. Trauma patients had the worst life expectancy (below average), while cardio patients had the greatest life expectancy in years above average. Results With the pooled OLS regression for the Cobb-Douglas functional form, we also include an intercept term. The OLS regression results are presented in Tables 3a, 3b, and 3d. Both input price variables have positive and significant coefficients. Medicaid has a positive and significant (P=0.03) coefficient, but Medicare has a positive and insignificant coefficient. Capital, case mix index, outpatient, and inpatient all have positive and significant coefficients, implying that more capital increases costs. Also, the inpatient and outpatient coefficients are significantly less than 1, implying economies of scale. In other words, one would expect that serving more patients will lead to lower costs per patient. Also, changes in the case mix have a significant impact on costs. The YEAR variable is negative and significant, implying the presence of exogenous technical change. Finally, we also include various specifications that include life expectancy variables, including two regressions that have the years above average age of death (LIFE), as well as several individual causes of death. These life expectancy results are discussed in more detail below. The OLS estimates are based on the Cobb-Douglas functional form, which can be rather inflexible because it does not allow for non-linearity in the data or model. A semi-parametric regression approach, as in the McMillen and Redfearn (2010) approach described above, can help control for nonlinearity with a more general functional form, while at the same time generating parameter estimates and standard errors that can be used for statistical inference. Also, with this semi-parametric regression approach, we omit the constant term from the model, because to obtain parameter estimates for the constant term we would need to run a model with the dependent variable always equal to 1, which introduces additional complications. Although we directly address the time dimension in one of the weighting kernels, we also include a time trend (YEAR) and find that it is negative and highly significant in most specifications. This result for the coefficient on the YEAR variable may be due to the presence of exogenous technical change. These semi-parametric results are shown in Table 4a and 4b. An important issue to consider in the estimation of the semi-parametric regression models is the appropriate bandwidth. As Ichimura and Todd (2006) discuss, the literature on semi-parametric regression models provides relatively little guidance on bandwidth choice, and the papers that address the issue focus primarily on special cases (i.e., binary choice models, censored regression models, single index models), none of which are directly applicable in our semiparametric regression models. A common approach for bandwidth selection in GWR models cross-validation – is not as straightforward in a semi-parametric regression framework. This is because unlike with nonparametric analysis, the semi-parametric regression models report only one parameter estimate for all observations on each explanatory variable, so the cross-validation approach is not directly applicable. McMillen and Redfearn (2010) report results for each of two different bandwidths 25% and 100% so we follow their approach and present results for each of these two bandwidths in each of our semi-parametric regression specifications. While there is evidence in the GWR literature that the estimation results can be sensitive to the bandwidth choice, we find little difference in the signs and significance of our parameter estimates in the semi-parametric regressions with the two bandwidths (25% and 100%). However, there are several important exceptions to this finding. Specifically, the significance of the Medicaid variable is opposite in the two bandwidths in the regression model where we allow for the endogenous “life” variable; the “YEAR” variable is insignificant in both bandwidth choices; In the semi-parametric regression models, for both the 25% and 100% bandwidth results the signs of all parameter estimates are positive and all highly significant (Tables 4a, 4b). Furthermore, the “inpatient days” parameter is also significantly less than 1 for the specifications without the LIFE variable, as well as those including the LIFE variable and including the other life expectancy variables, implying economies of scale for inpatient services. The same is true for the “outpatients” variable in the models where “life” is endogenous. In all of the semi-parametric regression specifications, the “outpatient visits” variable is statistically significantly less than 1, and it is also significantly greater than zero, implying economies of scale for outpatients but significant marginal costs. Another variable we included was a time trend (“year”), to allow for the possibility of exogenous technical change over the period of our sample. The parameter estimates on the time trend terms are highly significant in all semiparametric regression models except for the model where “LIFE” is endogenous. This may imply that after controlling for all cost determinants, (except for the case where we include life expectancy), costs fell over time, implying exogenous technical change. As we mention above, we also have life expectancy data for all of the years (1999-2008) for all hospitals in the data sample. With the models including the LIFE variable, we estimate an OLS specification (Table 3b), and a 2SLS specification (using one-period lagged values for all of the explanatory variables as the instruments for LIFE, with results in Table 3c). For the most part, the signs and significance of the parameter estimates are similar across the OLS and 2SLS models, with a negative and significant parameter estimate for the LIFE variable in the OLS model but insignificant parameter estimate in the 2SLS model. This insignificance leads one to wonder how the results might differ with a semi-parametric approach, and we address this below. We also re-run the semiparametric models for a few variations that include life expectancy measures as “outputs” – one including overall lifespan years above average for each hospital, and a separate set of models where individual causes of death are included (cardio, cancer, diabetes, trauma, and stroke). In Tables 5a and 5b, the parameter on LIFE (modeled as exogenous) is negative and highly significant for both bandwidths in the semi-parametric models, implying hospitals with patients who live longer than average also experience significantly lower costs on average. In Tables 6a and 6b, we include years above average lifespan for individual causes of death as outcomes in the cost function, assuming each of these cause of death variables are exogenous. For the 25% bandwidth, the coefficients on cardio, cancer, trauma and diabetes are all negative and significant, implying hospitals with cardio and stroke cause of death patients who live longer on average also experience significantly lower costs. Stroke patients, on the other hand, tend to increase costs for hospitals when these patients live longer. In contrast, all of the separate cause of death variables are positive and significant in the 100% bandwidth, which implies higher life expectancies for patients who die from each of these diseases also leads to higher hospital costs. Also, the “year” variable is negative and significant in the 25% bandwidth, while it is positive and significant in the 100% bandwidth. An important contrast to these results, however, is when we control for potential endogeneity of these specific categories of life expectancy in a parametric 2SLS estimation procedure, with results shown in Table 3e. In that context, the parameter estimates for all of the individual life expectancy variables are insignificant. We are unable to estimate the 2SLS model in a semi-parametric context because the Baltagi and Li approach is intended for a situation with only one endogenous variable. Perhaps using the parametric approach when controlling for endogeneity of the life expectancy variables leads to biased parameter estimates and/or insignificance of these variables. When we estimate the model with the overall life variable as endogenous (Tables 7a, 7b), we obtain an estimate of the parameter on the life variable as ranging from -0.0000308 (for the 25% window) to -0.0000474 (for the 100% window). For the average hospital in the average year, which has costs of $157,000,000, this implies a mean cost increase of approximately $4,700 (for the 25% window) to $7,400 (for the 100% window) for each additional year of life expectancy above the average, after controlling for the endogeneity of the life expectancy variable and allowing for spatial heterogeneity over space and time. We also calculate this cost of greater life expectancy for each hospital in our sample, and this implies a range of $484 to $22,620 per additional year of life for the 25% window, and $758 to $35,438 for the 100% window. Clearly, these findings imply some hospitals benefit much more than others from treating patients with longer life expectancies. In the semi-parametric regression models where LIFE is endogenous, the first step entails estimating the parameter for the LIFE variable using two-stage least squares semi-parametric estimator; then, the remaining coefficients are estimated with a nonparametric approach. For these nonparametric estimates, there are 300 separate coefficient estimates, so we list standard deviations of these coefficient estimates, and inference is not straightforward. Instead, we perform F-tests to test for the explanatory power of including each nonparametric variable in the model. The version of the F-test here is slightly different from the one described in McMillen and Redfearn (2010); here, we replace Y with (Y X then re-estimate the model using locally weighted regressions for the nonparametric variables, z, while omitting the variable for which we desire the Fstatistic. Tables 7a and 7b present the F-statistics for the exogenous variables, as well as the t-statistic for the (endogenous) LIFE variable. While the LIFE variable is negative and highly significant (P-value=0.00000), the F-tests also imply the marginal costs of treating inpatients and outpatients are significantly greater than zero. These inpatient and outpatient treatment cost parameters are significantly less than 1, implying economies of scale. Perhaps this is due to under-utilized capacity at most hospitals in many years, leading to highly significant economies of scale in both inpatient and outpatient services. With the exception of the YEAR variable, the case-mix and Medicaid days variables, all other variables in the semi-parametric regressions with endogenous LIFE are jointly significant for both bandwidths based on the F-tests. Conclusions We estimate a cost function model for all hospitals in the state of Connecticut, USA, for each year in the period 1999-2008. Our approaches include least squares, semi-parametric regressions, and semi-parametric regressions in the presence of an endogenous variable. The semi-parametric regression approach allows for a general functional form that is less restrictive than Cobb-Douglas, and we also introduce an estimator for the situation where life expectancy is considered endogenous. Our Monte Carlo simulations support the notion that this endogenous variable semi-parametric regression model performs well. We incorporate a time trend in the regressions, and the sign of this variable is negative and significant in many of our models, which is indicative of exogenous technical change. To fully leverage the space-time panel nature of our data set, we also introduce a sophisticated kernel structure that incorporates both the time and space dimensions of our data, which is a contribution to the literature on semi-parametric regressions (particularly in the hospitals context). The “inpatient days” and “outpatient visits” variables are significant in most variations of the model, and we also find evidence of economies of scale in all models. Adding the life expectancy variables in the semi-parametric regression models leads to smaller economies of scale estimates, compared with the models without the life expectancy variables. These findings imply that allowing for a general functional form with semiparametric regression models, in a panel (space-time) data framework that also allows for endogeneity of life expectancy, can be crucial in adequately assessing the determinants of hospital costs. Ignoring the spatial elements of the data that are inherently controlled for by the GWR model, and failing to address the potential endogeneity of the LIFE variable can lead to incorrect inferences for the relationship between LIFE and total hospital operating costs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semi-parametric Quantile Regression for Analysing Continuous Longitudinal Responses

Recently, quantile regression (QR) models are often applied for longitudinal data analysis. When the distribution of responses seems to be skew and asymmetric due to outliers and heavy-tails, QR models may work suitably. In this paper, a semi-parametric quantile regression model is developed for analysing continuous longitudinal responses. The error term's distribution is assumed to be Asymmetr...

متن کامل

Evaluation of Survival Analysis Models for Predicting Factors Infuencing the Time of Brucellosis Diagnosis

Background:Brucellosis or Malta fever is one of the most common zoonotic diseases in the world. In addition to causing human suffering and dire economic impact on animals, due to the high prevalence of Brucellosis in the western regions of Isfahan province, this study aimed to analyze effective factors in the time of Brucellosis diagnosis using parametric and semi-parametric mo...

متن کامل

Application of semi-parametric single-index two-part regression and parametric two-part regression in estimation of the cost of functional gastrointestinal disorders

AIM For the purpose of cost modeling, the semi-parametric single-index two-part model was utilized in the paper. Furthermore, as functional gastrointestinal diseases which are well-known as common causes of illness among the society people in terms of both the number of patients and prevalence in a specific time interval, this research estimated the average cost of functional gastrointestinal d...

متن کامل

مقایسه رگرسیون کاکس و مدل های پارامتریک در تحلیل بقای بیماران مبتلا به سرطان معده

Background & Objectives: Although Cox regression is commonly used to detect relationships between patient survival and demographic/clinical variables, there are situations where parametric models can yield more accurate results. The objective of this study was to compare two survival regression methods, namely Cox regression and parametric models, in patients with gastric carcinoma registered a...

متن کامل

Semi-parametric single-index two-part regression models

In this paper, we proposed a semi-parametric single-index two-part regression model to weaken assumptions in parametric regression methods that were frequently used in the analysis of skewed data with additional zero values. The estimation procedure for the parameters of interest in the model was easily implemented. The proposed estimators were shown to be consistent and asymptotically normal. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013